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abstract 


The minimal polynomial extrapolation (MPE) and reduced rank extrapolation (RRE) 
are two very effective techniques that have been used in accelerating the convergence 
of vector sequences, such as those that are obtained from iterative solution of linear 
and nonlinear systems of equations. Their definitions involve some linear least 
squares problems, and this causes difficulties in their numerical implementation. In 
this work timewise efficient and numerically stable implementations for MPE an 
RRE are developed. A computer program written in FORTRAN 77 is also appended 
and applied to some model problems. 
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1. INTRODUCTION 

The minimal polynomial extrapolation (MPE) of Cabay and Jackson [2] and the reduced rank 
extrapolation (RRE) of Eddy [3] and MeSina [9] are two methods used in accelerating the convergence 
of a large class of vector sequences. In particular, they are employed for accelerating the convergence 
of fixed point iterative techniques for linear or nonlinear systems of equations, such as those that arise 
in the discrete solution of continuum problems. 

A unified treatment of these and other extrapolation methods has been given in the survey paper 
of Smith, Ford, and Sidi [19], where some numerical testing for them is also provided. Detailed con- 
vergence analyses for MPE and RRE have been presented in Sidi [12], Sidi and Bridger [16], and Sidi 
[13], and we shall mention some of the results that follow from these analyses later in this work. Also, 
both MPE and RRE are very closely related to some well known Krylov subspace methods when they 
are applied to linearly generated vector sequences, and this subject is explored in detail in [13], In fact, 
MPE and RRE are equivalent to the Amoldi method and generalized conjugate residuals (GCR), 
respectively, when they are all applied to linear systems of equations starting with the same initial 
approximation. For the method of Amoldi see Saad [10], and for GCR see Eisenstat, Elman, and 
Schultz [4], We also mention that the conjugate gradient type method of Axelsson [1], the method of 
Young and Jea [22] that has been called ORTHODIR, and the recent generalized minimal residual 
method (GMRES) of Saad and Schultz [11] are all equivalent to GCR, and are used in solving linear 
equations. Recursion relations that exist amongst various approximations that are obtained from both 
methods are discussed in the paper by Ford and Sidi [6], where the existence of an interesting four- 
term lozenge recursion is shown. MPE and RRE have been employed successfully in Sidi and Celes- 
tina [17] in accelerating the convergence of some finite difference solution techniques in large scale 
computational fluid dynamics problems. Finally, the application of MPE and RRE and other vector 
extrapolation methods to the iterative solution of consistent singular linear systems has been con- 
sidered in Sidi [15], where this approach is shown to be sound theoretically, and precise convergence 
analyses are also provided 

The definitions of MPE and RRE involve the solution of a linear least squares problem, the 
number of equations in this problem being equal to the dimension of the vectors in the given 
sequence. Since, in general, this dimension may be very large, as it is, for example, in three- 
dimensional computational fluid dynamics problems, the matrix of the least squares problem may be 
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vcry large. Thus, if standard linear least squares packages are used, the time and core memory require- 
ments in the implementation of MPE and RRE may become prohibitive. To circumvent this problem, 
the solution of the linear least squares problem was achieved in [17] by solving the corresponding nor- 
mal equations that is much less costly than using least squares packages. This approach proves to be 
quite efficient when the amount of extrapolation is not very large. When the amount of extrapolation is 
increased, however, the accuracy decreases, as the normal equations become very ill conditioned. 

In the present work we propose new implementations for MPE and RRE, which are very inex- 
pensive as far as both time and core memory requirements are concerned, and are stable numerically 
as the amount of extrapolation is increased. These implementations are also quite interesting 
mathematically, as they allow one to compute exactly (or estimate) the accuracy achieved in the extra- 
polation process without actually computing the residuals at each stage. This can be employed to 
further reduce the cost of implementation. 

The plan of this paper is as follows: In Section 2 we briefly review the definitions of MPE and 
RRE. In Section 3 we consider the application of MPE and RRE to vector sequences that are gen- 
erated by iterative solution of linear systems as this provides the motivation for different modes of 
usage of the methods. We devote Sections 4-6 to the development of the new implementations of 
MPE and RRE and the description of the mathematical features of these implementations. In Section 4 
we give the details of the new implementations. One of the crucial ingredients of these implementa- 
tions is the efficient solution of the least squares problems by use of QR factorization. In Section 5 we 
show how, in these new implementations, the / j-norms of the residuals can be computed exactly for 
linear systems (or estimated for nonlinear systems) without doing extra vector computations. This 
enables us to assess the accuracy of the extrapolation without actually carrying it out, and can be used 
to reduce the amount of computation drastically. In Section 6 we discuss the operation counts and the 
storage requirements for the new implementations. In Section 7 we discuss some practical matters 
concerning the efficient use of MPE or RRE or any other vector extrapolation methods. Finally, in 
Section 8 we give some numerical results obtained by applying MPE and RRE through their new 
implementations to certain model problems. A computer program written in FORTRAN 77 that 
implements MPE and RRE is provided in the appendix. 
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2. REVIEW OF MPE AND RRE 


Let xq,x\,xi,..., be a given sequence of W-dimensional column vectors, and denote its limit or 
antilimit by s. The vectors x s are assumed to be complex, in general. Define 


u, - Ax, = x 1+] -x, and w, = Au, 

= A 2 Xj, i=0, 1,2 

(2.1) 

Define the NxQ'+l) matrices U and by 




• ' !«»+/] 

(2.2) 

and 



W<"> = [w* lw„ + , 1 • ■ 

’ Iw.+y] . 

(2.3) 


2.1 Definition of MPE 

For MPE the approximation s„ k to s, the desired limit or antilimit, is defined by 

k 

S n,k = ZyjXn+j ■ (2.4) 

;=o 

where the y ; are determined as follows: 

(i) Use the least squares method to solve the overdetermined and, in general, inconsistent linear 
system 

Ul n } l c = -u n+k . (2.5) 


where c = (co,c'i, . . . ,c *_ \) r . 

(ii) Set c* = 1 , and compute the y, by 

Yy = ~T~' 0<jZk, 

i =0 


assuming that £c, * 0. When this condition is not satisfied s n k does not exist. 
1=0 


( 2 . 6 ) 


2.2 Definition of RRE 

For RRE the approximation s n k to s , the desired limit or antilimit, is defined by 

k - 1 

S n,k 'i' J » (2.7) 

«=o 


where the ^ are determined by solving the overdetermined and, in general, inconsistent linear system 
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W[*)^ = -u H , (2.8) 

with £ = (£o.£i> ■ • • .£*-i) T . us i n 8 the least squares method. Since a least squares solution to (2.8) 
always exists, s n k always exists. In particular, s nk exists uniquely when the matrix has full rank, 
i.e., rank (Wi"_\ ) = k, or, equivalently, when the vectors w„ +t _i are linearly indepen- 

dent. It can easily be shown that rank (W^J, ) = k, thus s n k exists uniquely, when rank (U[ n) ) = *+l. 

There exists an equivalent formulation of RRE that seems to be more suitable for computer 
implementation. It also has the advantage of unifying most of the algorithmic aspects of MPE and 
RRE. In this formulation k is of the form given in (2.4); only this time the y ) are obtained by the 
least squares solution of the overdetermined and, in general, inconsistent linear system 

U[ n) y=0, (2.9) 

where y= (Yo-Yi . . . . ,y k ) T , subject to the constraint 

k 

lY, = l- (2.10) 

j = o 

(Note that the y y in MPE satisfy (2.10) automatically, as can easily be seen from (2.6).) 

Remarks: 

* 

(1) It is important to realize that the y, in s n k = XY/ x » + i depend on both n and k. 

7=0 

(2) In most applications, N, the dimension of the vectors x ,, is much larger than k, so that the 
matrices have many more rows than columns. Therefore, there is great need to reduce the 
amount of numerical work with the columns of the matrices t/j"\ 

3. APPLICATION OF MPE AND RRE TO LINEAR SYSTEMS 

Consider the linear nonsingular N-dimensional linear system 

x = Ax +b , (3.1) 

where A is an NxN matrix and b is an N-dimensional column vector. Pick an initial vector *o, and 
generate the vectors xy,X2,—, by the iterative scheme 


Xi+i=Axi + b, i=0,l,... 


(3.2) 
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The solution s of (3.1) is now the limit of the sequence xq,x\ .* 2 . - when tbe latter converges, other- 
wise. s is the antilimit. 

Let k 0 be the degree of the minimal polynomial of the matrix A with respect to the vector x„-s. 

Then the following statements are true. 

(i) s n,k 0 is uniquely defined both for MPE and RRE, and 

Sn,k,=S ■ (3.3) 

Also the linear systems in (2.5), (2.8), and (2.9) are consistent for k = k 0 , even though they may 
be overdetermined. This is a consequence of the fact that the vectors u„ +; , 0 <j <k 0 -l, are 
linearly independent, and u„ +to lies in their span. (See [13, Section 2.2].) 

(ii) For k < k 0 , s nk is uniquely defined for RRE. For MPE, however, s Hk may fail to exist when 

k < k 0 ■ When the matrix C = / -A has positive definite hermitian part, s„ k exists uniquely for 

MPE also for k < k 0 . (See [13, Section 2.2].) More generally, s nk exists uniquely for MPE also 
for k < k 0 , if the eigenvalues of C all lie on one side of a straight line through the origin in the 
complex plane, or, equivalently, if they all he in an open sector S = {\i\ larg p-0 1 < re/ 2}, for 
some 0, -n < 0 < Jt. This result can be proved exactly as Theorem 2.2 in [13] with C there 
replaced by e~‘ 6 C. 

(iii) When the Amoldi method and GCR are used in solving the linear system Cx = b, where 

C = l -A, with x n as the initial vector, they become equivalent to MPE and RRE, respectively. 

Specifically, the approximations obtained from the Amoldi method and GCR are exactly 

. i » s » .2 that are produced by MPE and RRE, respectively. (See [13, Section 2.3].) 

(iv) If the distinct nonzero eigenvalues of A are denoted X ; , j = 1,2,..., and are ordered such that 


1 X| 1 2 1 X 2 1 I X 3 1 2 • • • • 

(3.4) 

then, provided 


IX.* 1 > lX.jt + ] 1 , 

(3.5) 

and A is diagonalizable, we have 


s„,*-s = 0(lA.* + i 1") as /!—>«, 

(3.6) 


both for MPE and RRE. (The coefficient of I" on the right hand side of (3.6) becomes 
large when the largest eigenvalues Xj.Xj are close to 1.) In view of the fact that 
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x n - s = 0(1^1") as n , we conclude that MPE and RRE arc both true acceleration 
methods. Under the same conditions, if s nk -s is precisely 0(IX* +1 1") as n -> °° , then, the Yy 
for MPE and RRE are such that 

P H - k (X) = SyA 7 = II TT - + 0 <• W**'") as n , (3.7) 

y=o >=i 1 Ki 

i.e., for fixed k and for all sufficiently large n, the polynomial P {n - k) (X) has precisely k zeros that 

tend to X\,X 2 X k . Furthermore, if we denote the zero of P (n ’ k) (X) that tends to Xj by Xj(n), 

then 

Xj(n)-Xj = 0(\X k+l fXjr) as \<j<k. (3.8) 

The proofs of (3.6) and (3.7) have been given in [12, Sections 3 and 4], The proof of (3.8) will 
be published in the future. In case the matrix A in (3.2) is normal, the right hand sides of (3.7) 
and (3.8) can be replaced by 0(1 X.*+i /X* I 2 ") and 0(lX* + i/Xyl 2 "), respectively. (The result in 
(3.6) remains the same, however.) This implies that when A is normal the rates of converge of 
p (n - k \X) and its zeros X ; (n) are twice those that can be achieved otherwise. These results fol- 
low from the corresponding results of Sidi [14]. 

For the most general case in which the matrix A is not diagonalizable, the results in (3.6)-(3.8) 
need to be modified considerably. For a complete treatment of this case see [16, Sections 2,3, 
and 5], where modifications of (3.6) and (3.7) are given. The modification of (3.8) will be pub- 
lished in the future. 

A direct consequence of the result given in (3.6) is that better accuracy may be obtained if extra- 
polation is preceded by a number of fixed point iterations. This has indeed been observed 
numerically both for linear and nonlinear problems. We shall comment on this again in Section 
7. 

(v) Let us denote C = I -A. Then 5 is the solution to Cx = b. Denote by n k the set of all polynomi- 
als Q k (X) of degree at most k that satisfy £>*(0) = 1. Consider now s n k as obtained by applying 
MPE or RRE to the vector sequence x 0 ,x i .... . Then 

llr(j II *)ll <( min H(2*(C)II )llr(xjll for RRE, (3.9) 

G. 6 

where r(x) = Ax+b—x = b—Cx = —C (x—s) is the residual for x, and II • II is the / 2 vector norm, 
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or the matrix norm induced by it. (In fact, II r(s H *)ll , k =0,1,2,..., is a monotonically decreas- 
ing sequence for RRE.) Similarly, if C h = y(C+C*), the hermitian part of C, is positive 
definite, then 

IIC£ / 2 (s„.t-s)ll *P( min IIG*(C)II )IICi / 2 (^-j)ll forMPE, (3 10) 

Q, e «. 


where 


P = 


L>/cond(C h ) 


if C is normal 
otherwise. 


(3.11) 


with L = IICh l/2 C C h 1/2 II > 1. Note that both II r(jc) 1 1 and IICh /2 x II are true norms forx Two 
types of bounds for min ll£*(C)ll , in case C h is positive definite, are given in [13, Section 4], 

and these can be used to derive upper bounds for II r(j nJk ) 1 1 and IICh /2 (5 n4 -s)ll forfixednand 
increasing k. For details see (13). These bounds are employed in [17] to justify the use of the 
extrapolation strategy that has been called "cycling" in [19] and all subsequent publications. 

Finally, analogous and almost identical results exist for the case in which the system in (3.1) is 
singular but consistent, so that it has an infinity of solutions. In this case the limit or antilimit depends 
onxo m a very specific manner. For details, see [15]. 


Remark: The various Krylov subspace methods like the Amoldi method and GCR and others can be 
applied only to linear systems. Acceleration methods such as MPE and RRE, however, can be applied 
to nonlinear systems as well as linear ones. The reason for this is that, unlike the Krylov subspace 
methods, MPE and RRE are defined exclusively in terms of the given vector sequence, which may be 
generated, for example, by an iterative method. Whether the vector sequence is generated linearly or 
nonlinearly is irrelevant to the definitions of MPE and RRE and other vector extrapolation methods. 
This is a very important property of vector extrapolation methods. 


4. IMPLEMENTATION OF MPE AND RRE 
4.1 General Considerations 

As we have seen in Section 2, both MPE and RRE entail linear least squares problems in their 
definitions. There is. therefore, an immediate need for the efficient solution of these problems. We 
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propose to solve these problems by applying the QR factorization to the matrices U[ n) . 

To keep the notation simple we shall set n = 0 everywhere, and denote the matrices £/j 0) by Uj. 
This amounts to simply renaming x„ and calling it x 0 . 

We assume that the vectors u 0 ,ui «* are linearly independent so that the Nx(k+\) matrix U k 

is of full rank *+1. The case in which u 0< u\ u* are linearly dependent will be discussed later in 

this section. We recall that for the linear system in (3.1) this assumption is valid when k < k 0 , where 
*o is the degree of the minimal polynomial of the matrix A with respect to the vector x 0 -s. Therefore, 
there is a unique /Vx(*+1) matrix Q k , 

G* = [<7ol<M' < 41 > 

whose columns q, satisfy 

(QkQj) -Q’iQ] — Sy . ( 4 ’ 2 > 

and a unique (ik+l)x(jfc+l) upper triangular matrix R k , 

r oo r 0 i ro 2 ' ' ' r o * 

Hi r 12 r lk 

r 22 ' ' ' r 2k 

o 

r kk 

with r„ > 0, i =0, such that 

U k = Q k R k . (4-4) 

This QR factorization amounts to orthonormalizing the vectors wo- u i> w 2 t - . * n this order. It is 
important to retain this order, as this enables us to form the QR factorization of (/*+ 1 by appending 
one additional column to Q k to obtain Q k + j, and a corresponding column to R k to obtain R k+i . Need- 
less to say, this results in considerable savings in computing time. 

QR factorization can be performed in different ways. The simplest way is the Gram-Schmidt 
(GS) process for orthonormalization of uq,U\,U 2 ,— • This process is very unstable, however, in the 

sense that the computed vectors qQ,q\,qi are very far from being orthogonal. The modified 

Gram-Schmidt (MGS) process, on the other hand, seems to be quite stable, and is the one that we have 
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preferred. We recall that MGS is entirely equivalent to GS mathematically, and requires the same 
number of arithmetic operations as GS. The two methods are different numerically, however. For 
details, see, e.g., Golub and Van Loan [7, pp. 218-219). 

For the sake of completeness we describe MGS for the case in which the vectors u 0 ,ui,u 2 ,... , 
are introduced one by one and in this order. 



(Here (y,z) stands for the Euclidean inner product y*z, as before.) 

It is easy to sec that, when implementing MGS on a computer, .... and q k can all 

be made to occupy the same storage locations. As we shall see in the next paragraph, the computation 
of so.* can be based on the without the need to save either the xj or the Uj. We can thus let u k 
occupy the same storage locations as the 

QR factorization can also be achieved by using Householder transformations. Although the com- 
puted matrices Q k produced in this approach are closer to unitary than those produced by MGS when 
the / 2 condition number of U k is large, the amount of computing in this approach is about twice that 
required by MGS. We shall elaborate on this further in Section 7. 

W'e now recall from the definitions of MPE and RRE, that the approximations so,* f° r both 
methods can be expressed in the form 

* * 

Jo.* = EVv with XY, = 1 • (4.5) 

J m 0 ;=0 

Assuming that Yo.Yi • • .Y* have 1x60 determined, let us compute • • • >£*-i from 
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^0 = 1— Yo and = ^>-i~ Yy . l£j&k-l. (4.6) 

Then, we can reexpress $<>,* in the form 

*-t 

*o.* = *o + = *o + • (4.7) 

i=0 

where % = ^*-i) T . Substituting now £/*_, = G*_, R k - 1 in (4.7), we obtain 

*-i 

S0.*=X0 + <2*-i(/?*-I^) = *0+Z t 1;?; - (4.8) 

>= 0 

where 

T^ = (/+l)st component ofthe column vector j *0,1, ...,4-1 • (4 9) 

This approach to the computation of s 0ik is very advantageous, as it enables us to overwrite 
x | ,x 2 and u 0 ,u \ , .... and thus saves a lot of storage. 

4.2 Determination of the y ) When rank (U k ) = 1+1 

The only thing that remains to be done now is to determine the y t , and this requires separate 
treatments for MPE and RRE. 


4.2.1 Determination of the y ) for MPE 

As mentioned in Section 2, in order to determine the yj for s 0 , k in MPE we first solve the over- 
determined system 

U k -\c = —u k ( 4 . 10 ) 

by least squares. Since we also assume that the rank of U k is 1+1, we conclude that c is the unique 
solution of the normal equations 

UU U k . x c = -U\. x u k . (4.11) 

Upon invoking t/ t _, = Q k - , R k - X in (4.1 1) and using the fact that Q* k .\ Q k . \ = / ix * = the 1x1 identity 
matrix, and the fact that /?*_ j is a nonsingular matrix, we obtain 

R k -i c — —Qk-i u k ■ (412) 

It is easy to see that 

Q\- i«* = ('’o*.'T* r *-t.*) T 3 P* • (413) 


so that (4.12) becomes 


. 12 - 


R k -\c=~Pk- (4.14) 

This is a linear system of it equations in the k unknowns c 0 ,c c*_i , and its matrix i is upper tri- 

angular. Hence its solution can be achieved easily by back substitution. 

Once Co.Ci....,c*_i are determined, we set c k = 1, and compute the y ) from (2.6), provided 

L c < * °- 

i-0 


4.2.2 Determination of the y ) for RRE 

Again as we mentioned in Section 2, the y ) for s 0 ,t in RRE can be determined by solving the 
overdetermined system 

U k y=0 (4.15) 


by least squares subject to the constraint 

£y, = 1- (4.16) 

y=o 


This amounts to minimizing the positive definite quadratic form yU* k U k y subject to (4.16). Conse- 
quently, the lemma in Appendix A applies, and the y ) can be obtained by solving the linear system of 
k+2 equations 


U\U k Y = Xe 

IY; = 1 

j=C 


(4.17) 


foryo.Yi. • • • *Y*> and Here 


5 = 0.1 1) T • 


(4.18) 


As is stated in the same lemma, X turns out to be strictly positive, and is given by 


X = yU k U k y at the solution. 


(4.19) 


The y ) can be obtained by first solving the linear system 

U‘ k U k d = e (4.20) 


for d = (d 0 ,d ! d k ) T , and letting 
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and finally setting 


X = 



.)* "0 


( 4 . 21 ) 


y=Xd. (4.22) 

As far as the solution of the system in (4.20) is concerned, we accomplish this again by using the 
QR factorization of (/*. Again by Q* k Q k = I( k + im* + i) =the (*+l)x(*+l) identity matrix, we can 
rewrite (4.20) in the form 

R k R k d = e . (4.23) 


This system can be solved by forward and back substitution as the matrix R k is upper triangular. 


4.3 Treatment of the Case rank (U k ) = k 

Up to this point we discussed the case in which the vectors u 0 ,u j , ...,«* are linearly independent. 
Since these vectors are being introduced one by one, we can view this case as adding the vector u k to 

the linearly independent set {u Q ,u x u k -\} and obtaining the linearly independent set {u 0 ,u x u iJ- 

We now consider the case in which {u 0 ,U\,...,u k -\} is a linearly independent set, but {u 0 ,u\, . . . ,u/J 
is not, i.e., rank ( U k ) = k. This exhibits itself through r u = 0 in the QR factorization step. 

If we are applying MPE, then we can compute the y, by solving the (nonsingular) system in 

k 

(4.14) and employing (2.6), provided £c, * 0 there. We then compute so,*' 

i=0 

If we are applying RRE, we can compute so, k as follows: First, by the linear dependence of 

k 

u 0 ,ui u k , there exist constants Oo,ai,...,a*. not all zero such that £a,u, =0. This implies that the 

linear system in (4.15) is consistent. Also we can write U k = Q k R k , where Q k and R k are as in (4.1)- 
(4.3), q 0 ,qi,...,q k -\ are uniquely determined and q k is arbitrary in (4.1), and r u = 0 in (4.3). Multiply- 
ing both sides of (4.15) by Q k , and using the fact that Q k Q k = /<*+ i)x(*+i). we obtain the system of k+2 
equations 

k 

R k y = 0 and £y, = 1- (4.24) 

Now, by r w = 0, this system actually consists of the k+l inhomogeneous equations 
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* 

I p*)Y= 0 and £y, = 1 (4.25) 

o 

in the it +1 unknowns Yo-Yi. • • • <Y*- Since a least squares solution for the linear system in (2.8) always 
exists, a solution for the y, always exists too. Consequently, the equations in (4.25) always have a 
solution forRRE. Once we determine a set of Y/s, we compute j 0< *. 

Comparing (4.25) with (4.14) and (2.6), we see that if Jo,* exists for MPE when rank (U k ) - k, 
then it is equal to jq * for RRE. 

If the vector sequence x 0 ,x\,x 2 is generated as in (3.2), then, as explained in Section 3, 

rank ((/*) = k+\ for k < k 0 , where k 0 is the degree of the minimal polynomial of A with respect to 
x 0 -J. The smallest value of k for which rank ((/*) = k is ko, and at k = ko we already reach the solu- 
tion. i.e., Jo.* 0 = s. That is the first time = 0 occurs, we have Jo.* = j, and stop. 

If the vector sequence xo,x\,x 2 ,..., is not generated linearly, and rank (U k -\) = k, but 
rank(t/*) = k < *+1, then we can compute j 0 ,* first, and then take j 0 ,* or a nearby vector as x 0 , and 
restart the computation. Other strategies for continuing the computation can likewise be devised, but 
we shall not pursue this matter further. 

It should be mentioned, however, that, due to roundoff, the chances of encountering the case 
rank (U k ) < l'+l in practice are extremely small. We have thus not included the treatment of this case 
in the computer program given in the appendix. 

4.4 Summary of Implementations 

We now summarize the major steps of the implementations, as they have been described above. 
We assume that all the matrices U k have full rank. 

Suppose that, starting with x 0 , we have constructed the matrices Q*_t and R k - 1 . 

We now read x* +] and compute u k =x k+i -x k . Following this, using MGS, we compute the 
scalars r 0k ,r lt ,...,r u and the orthonormal vector q k , which we use to augment the matrices (2*-i and 
/?*_, to give Q k and R k , respectively. 

We next proceed to the computation of the y>- For MPE, we first solve the upper triangular kxk 
system in (4.14) for co,Ci,...,c*_i by back substitution, and then use (2.6) to obtain the y ,■ For RRE, 
we solve the (*+l)x((+l) system in (4.23) for d, and then determine the y ; by (4.21) and (4.22). The 
solution of the system in (4.23) can be achieved very simply by forward and back substitution as R k is 
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upper triangular. 

Once the y ) have been dctennined, we compute the %j by (4.6) and the r| ; by (4.9). and linally. 

so.k by (4.8). 

Next we read x k + 2 < and proceed similarly, until a suitable slopping criterion is met. 

It should be noted that, strictly speaking, neither r u nor q k is needed for determining .v«, and 
their computation can be completed after x k +2 has been introduced. In the computer program that we 
give in the appendix, though, we chose to compute r u and q k before the computation of v o.* - 

Finally, it is not difficult to see that these implementations arc very appropriate for vector com- 
puters as their handling of the x ( , u„ and q , can be entirely vectorized. The computer program given in 
the appendix to this work has been written to take full account of this. 

5. ESTIMATION OF RESIDUAL NORMS 

5.1 General Considerations for Linear and Nonlinear Systems 

Let s be the solution of the linear or nonlinear system of equations 

x=F(x), (5 1) 

and let us define the residual for an arbitrary vector x by 

r(x) = F(xy-x . (5.2) 

Let x 0 be a given initial approximation, and generate the sequence of vectors x x ,X 2 , .., according 
to the fixed point iterative method 

*, + , =F( Xj ), > = 0.1 f 5.3> 

Consequently, the residual for x ; is given by 

r(Xj) = F(Xj>-Xj =x J + l -Xj -Uj , (5 A) 

thus is readily available. 

Let us assume that MPE or RRE is applied to the sequence • . and that we arc com- 
puting the sequence s n |,j„ 2 . - ■ Let us assume also that we would like to stop the computation as 

soon as some norm of r(s n k ) becomes < £ for some k, £ > 0 being a preassigned level of accuracy. 
The most direct way of doing this would be by actually computing the vectors 
5„. i ,r (5*. i >.i„, 2,r(s n , 2 ) w hich is very costly. 


-16- 


Indeed, the computation of * involves about k vector additions and k scalar-vector multiplica- 
tions, that of r ($„*). by (5.2), amounts to one additional fixed point iteration and one vector addition, 
and the computation of the norm of r{s H%k ) requires an additional inner product. In addition, the 
number of the vector operations increases with increasing k. In view of this, the most desirable situa- 
tion is one that enables us to estimate some norm of r(s H >t ) without having to compute either s„ k or 
r(s„' k )- 


S2 Residual Computation for Linear Systems 

We now devise a strategy by which the / 2 -norms of the residuals r(j„ *) can be obtained exactly 
without the need to compute either s n>k or /•($„*), when the sequence , is being generated 

linearly by the iterative method in (3.2), i.e., when F(x) = Ax+b in (5.3). The case in which F(x) is 
nonlinear will be considered at the end of this section. 

When F(x) =Ax+b, the residual for an arbitrary vector x, by (5.2), becomes 


r(x) = Ax + b-x . 

Consequently, by (2.10), (3.2), and (2.1), we have 

k 

r(io,*)= XY/«; = u k y, 

7=0 

and the / 2 -norm of r(s Q k ) is thus 

llr(Jo,*)H =(r(s 0 ' k ),r(s 0 ' k )) m =(yU* k U k y) m . 
By invoking U k = Q k R k in (5.7), we obtain 


M*o.*)ll =(Y‘*;**Y) 1/2 . 

We now analyze y* R k R k y for MPE and RRE separately. 


(5.5) 


(5.6) 


(5.7) 


(5.8) 


5.2.1 li -Norm of Residual with MPE 

Let us compute R k y first. By (4. 12)-(4. 14) we have 


[**-l Ip*] 



= 0 . 


k 

By dividing both sides of (5.9) by £c, with c k = 1, and invoking (2.6), we obtain 

i=0 


(5.9) 
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[**-i I P*]Y=0. (5.10) 

Substituting (5.10) in R k y, we finally have 

J?*Y=(0,0 0,r kt y k ) T , (5.1 1) 

from which we obtain 

(y’^*y ) 1/2 = ^iy*i • < 512) 

Consequently, for linearly generated sequences 

\\r(s Q .k)\\ =r u lY*l (5- ] 3) 

exactly, with r(x) as defined in (5.5). 

5.2.2 / 2 -Norm of Residual with RRE 

By (4.19) we have immediately 

(y* U k U k y) 112 = Vx , ( 514 ) 

with X as determined from (4.23) and (4.21). Consequently, for linearly generated sequences 

llr(j 0i t)ll (5.15) 

exactly, with r(x) as defined in (5.5). 

The results given in (5.13) and (5.15) assume exact arithmetic. Due to roundoff errors, however, 
the actually computed residual norms may be getting farther from (5.13) and (5.15), especially when k 
is increasing. In this case it may be appropriate to compute s 0 k and the norm of its residual every 
once in a while to make sure that roundoff has not started to dominate the computations. Although 
such a test is not included in the computer program given in the appendix, it is quite easy to incor- 
porate it there. 

5.3 Practical Residual Estimation in Extrapolation for Nonlinear Systems 

We now consider the problem of error estimation for the case in which F(x) in (5.1) is nonlinear. 
Let us assume that the sequence xo.*i,* 2 > •• > is convergent, its limit, of course, being s, the solution 
of (5.1). Therefore, for n sufficiently large, x n ,x„+\,... , are all very close to s, and we have 

x n+i -s =F'(s)(x n -s) + e n . (516) 

where F"(x) is the Jacobian matrix of the vector valued function F (x), and £„ is a vector whose norm 
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is 0(\\x n -s II 2 ) as n -» <» . This implies that the sequence Xo.*t.*2." . behaves linearly at infinity, 
in the sense that 

x J+ \ = F'(s)x J + (s-F'(s)s) (5. 17) 

for all sufficiently large j. Thus, for n sufficiently large, we can take 

r(s H ' k ) = Ui n) y (5.18) 

c.f., (5.6), and 

llrfo,,*)!! =(yRP*R[ H) y) m , (5.19) 

c.f., (5.8), where we have retained the index n in and The norm in (5.19) is the 

/ 2 -norm as before. Consequently, we can take (5.19) as an estimate for the / 2 -norm of the residual 
r(s„ t ) without having to compute either s n k or r(s nk ), since it is given by (5.12) for MPE and by 
(5.14) for RRE. 

In case n is not large enough, (5.19) may not be very realistic. In this case we may choose to 
compute s n k and r(s n k ) not for all k, but for k =p, 2 p, 3 p,... , say, for some integer p > 1. This obvi- 
ously reduces the cost. 

When we are using MPE or RRE in the cycling mode, which is one of the best modes of usage, 
things become simpler. To see this let us recall how cycling can be performed. 

Step 1. Fix the integer k. Pick s k h xo and set q = 0. 

Step 2. Generate jcj, by (5.3). If llr(.rj^)ll = 1 1 jc x — jr 0 1 1 = ll«oH Iben stop. Otherwise, generate 
by (5.3). 

Step 3. Compute * s o,* by MPE or RRE. 

Step4. Replace xo by .si‘ ?+1) , and q by q+ 1, and go to Step 2. 

(That r(s^) = uq in Step 2 follows from (5.4).) 

Consequently, no extra computation for residuals is necessary, as uq is the true residual in each 


cycle. 
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6. OPERATION COUNT AND STORAGE REQUIREMENTS 

In most applications, N, the dimension of the vectors, is extremely large, while k takes on very 
small values. Consequently, the major part of the computational effort is spent in handling the large 
vectors, the rest being negligible. 

As we can easily see, most of the vector computations take place in the QR factorization. At the 
jtth stage that leads to s 0 .*. the vector x* + i is provided first. Starting with this, we need one vector 
addition to form u* =x k+i -x k , and, following that, k vector additions, k+\ scalar-vector multiplica- 
tions, and k+ 1 inner products to form the orthonormal vector q k and the scalars ro*,r u fa by 

MGS. The computation of s 0 ,a if desired, requires k vector additions and k scalar-vector multiplica- 
tions by (4.8). The computation of the y., &. and % is negligible, as it involves work with kxk or 
(*+l)x(*+l) triangular matrices for very small values of*. 

As for the storage requirements, it is clear that x 0 needs to be saved. At the kxh stage q k needs to 

be saved, in addition to the previously saved qo.q \ q k - 1 • We also need two or three more auxiliary 

vectors of dimension N. Similarly the elements of the matrix R k all need to be saved, but their storage 
requirements are negligible. 

In view of the above, if only Jo.x is needed for some preassigned K, then, recalling that the vec- 
tor qx need not be computed, the total operation count is '/2(K 2 +5K +2) vector additions, l A(K 2 +5K) 
scalar-vector multiplications, and ; /i(A 2 +3A+2) inner products, which amounts to ~2K 2 N floating 
point operations (scalar additions and multiplications). As for the storage requirements, we need 

(*+l)N storage locations for * 0 .<?o.<7i Qk -\ . and 2 N storage locations for two additional auxiliary 

vectors. No additional storage locations are required for so.* as $<>.* can overwrite *o at the end of the 
computation. 

In many cases it turns out that the accuracy that can be achieved with m cycles of MPE or RRE, 
each cycle being of width K, is comparable to that obtained for s o,m/r- If we compare the computa- 
tional costs of each of these strategies, we see that, roughly speaking, the former is m times less expen- 
sive computationally than the latter, and requires m times less storage. Thus, as a computational stra- 
tegy, cycling possesses important advantages. 

It is very instructive to compare the implementations for MPE and RRE, as they are given in this 
work, with the vector epsilon algorithm (VEA) of Wynn [21]. VEA is defined recursively by 
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e ( "i )= 0 and eft 0 = x „ , n = 0,l,... , 

, k> 0, n>0. 


ei" + >,=ei"4 1) + 


Aei' 


;«) 


(Ac{*\ Ae]^) 


( 6 . 1 ) 


where Aei" ) = ei' ,+1) -ei" ) , and z=(z\ Jn) t if z = (z i z*/) 7 . Thus, the computation of ei a) for 

k 22 requires two vector additions, one scalar-vector multiplication, and one inner product. For ef } 
only one vector addition is required. Now as is suggested by experience and as can be justified heurist- 
ically, for given K, for VEA and s Q K for MPE or RRE would have comparable performance. The 
total operation count for determining is 4 K 2 vector additions, 2 K 2 +K scalar-vector multiplica- 
tions, and 2K 2 +K inner products, which amounts to ~10K 2 N floating points operations (scalar addi- 
tions and multiplications). As for the storage requirements, we reed (2K+\)N storage locations to 

save and 2 N storage locations for two auxiliary vectors. Consequently, VEA is 

about live times more expensive than either MPE or RRE as far as operation counts are concerned. As 
far as storage requirements are concerned, VEA is about twice as expensive as either MPE or RRE. In 

addition, since x 0 ,xi x 2 k are needed for e^, whereas, only jco.jei are needed for 

either MPE or RRE, VEA is about twice as expensive as MPE or RRE with respect to the number of 
vectors they utilize. 

We note that, in the epsilon family of vector extrapolation methods, VEA seems to be the most 
advantageous as far as the operation count, storage requirements, and numerical stability are con- 
cerned. For more details, see [19]. 


7. SOME PRACTICAL CONSIDERATIONS FOR ENHANCING CONVERGENCE AND 
STABILITY 

In this section we would like to make a few remarks, which we believe are of practical impor- 
tance with regard to enhancing the convergence and stability of vector extrapolation methods as they 
are applied to iterative procedures. Most of these remarks are based on the known theoretical results 
concerning vector extrapolation methods, some of which have been discussed in Section 3. 

7.1 Effect of Iteration Before Extrapolation 

In most problems of interest the vector sequence *o.-*i converges extremely slowly so that 

there is not much difference between II jc„ — j II and IIjco-j II even for appreciably large values of n. 
The result in (3.6), however, suggests that there may be a large difference between II s II and 
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II Xk -s II (hence II jc 0 — ^ II ) if n is sufficiently large. If the vectors xj are produced by an iterative pro- 
cedure such as (3.2), then this implies that it may be very useful to start the extrapolation procedure 
after a number of iterations with (3.2). One heuristic argument in favor of this strategy runs as fol- 
lows: The initial error xo-s, in general, has components in the direction of all eigenvectors and princi- 
pal vectors of A. After a few iterations the components in the direction of those eigenvectors and prin- 
cipal vectors corresponding to zero eigenvalues of A are totally eliminated, while those corresponding 
to the eigenvalues that are close to zero are diminished. Consequently, the error vector x n —s has 
mostly contributions from the eigenvectors and principal vectors corresponding to the large eigen- 
values. Precisely these contributions are now diminished by the extrapolation procedure. 

7.2 A Simple "Averaging" of the Iteration Process and Its Effect on Convergence and Stability 

Assume that (3.2) or (5.3) result from the discrete solutions of continuum problems. Then, for a 
convergent scheme, the largest eigenvalues of A or of F'is), the Jacobian matrix of F(x) at x = s, may 
be very close to 1 in the complex plane in some cases. This may cause the extrapolation process not to 
be very effective. The process may even suffer from a large amount of numerical instability. 

One way of dealing with this problem is by applying extrapolation methods not to the sequence 

xq,x\,x 2 but to y 0 .>’i.)’2 where yj~ x jp - for 150,06 positive integer p. This strategy has been 

successfully implemented in [17]. 

Another way would be by changing (5.3), in general, to read 

=X; + d)(F(Xj}-Xj) = (1-©)*, + ©F(x 7 ), > = 0,1 (7.1) 

where © is a scalar different than 1. (The sequence generated by taking co = 1 is the one generated by 
(5.3).) Thus x J+ \ is now a weighted "average" of x } and F(x } ), in which the weights 1-© and © need 
not be both positive. 

By picking to appropriately we can cause the spectrum of the Jacobian matrix of 
(1 — co)x + ©F(x) at x = s, namely, (1-©)/ + ©F'(s). to be increasingly favorable to s n k for large values 
of n. 

Let us take a look at the following example: Suppose the eigenvalues of F"(s) are all positive and 
lie in the interval [e,l-r|] for some e>0 and q > 0 close to zero. Consequently, the sequence 
xo.xi,. ., obtained from (5.3) converges, provided xo is sufficiently close to s in case F(x) is non- 
linear, and unconditionally in case F(x) is linear. If we pick © = 2, then the eigenvalues of 
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(1-©)/ + toF'(j) lie in the interval [-l+2e, 1— 2r| ] so that the sequence obtained from (7.1) also con- 
verges. (If e = r|. then this sequence converges more quickly than the one obtained from (5.3).) The 
new spectrum has two important properties relevant to vector extrapolation methods: 1) The largest 
positive eigenvalue of F'(s), namely, 1— r>, has moved away from 1. 2) Negative eigenvalues close to 
-1 have been created. Both of these properties enhance the stability of vector extrapolation processes 
both mathematically and numerically. (This follows from [12, Theorem 4.1], [16, Theorem 3.2], and 
[18. Theorems 4.1 and 5.2].) It should be noted that 2 is also that value of to for which the spectral 
radius of (1-©)/ + ©F'(s) is minimal when e = q. 

7.2.1 Special Considerations for Linear Systems 

When F (x) = Ax+b, and the vector sequence is generated by the iterative procedure in (7.1), the 
approximations .y, k are independent of ©, as has been shown by Israeli and Sidi [8], That is to say, 
the convergence properties of the s o,* arc not changed by varying ©. Nevertheless, varying w may 
influence the stability properties of the numerical implementations. 

First, if the sequence obtained from (3.2) is divergent, then all the computations leading to s 0 * 
will suffer a large loss of accuracy, especially for increasing k. By changing © in (7.1) appropriately, 
we can cause the sequence to converge (or diverge very slowly), thus avoiding the numerical problem 
caused by the unboundedness of the original sequence. 

Next, if the sequence obtained from (3.2) is slowly converging on account of the largest eigen- 
values of A all being very close to 1 in the complex plane, then the vectors «o,«i,«2* are near being 
linearly dependent. Consequently, the / 2 condition number of the matrices U k may be very large. This 
may have a negative influence on the QR factorization of U k by MGS that we have chosen for our 
implementation. This influence exhibits itself in the computed matrices Q k being far from unitary and 
the computed s 0 .* not being very accurate. If, by picking w appropriately in (7.1), we can change the 
spectrum in such a way that it now contains both positive and negative large eigenvalues, then the vec- 
tors uq,u 1 ,u 2 _. . will be far from being linearly dependent numerically. This will result in better con- 
ditioned matrices U k , which, in turn, will result in the computed matrices Q k being closer to unitary 
and the computed s 0 ,* being quite accurate. 

The numerical aspects of MGS and its use in the solution of least squares problems and the com- 
parison of these with the Householder QR factorization and least squares solutions are discussed at 
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length in [7, Sections 5.2.8, 5.2.9, and 5.3.6]. 


7.2.2 Application to Jacobi Iteration for Consistently Ordered Matrices 

The observations above can be used very effectively in the solution of linear systems whose 
matrices are consistently ordered. Such matrices arise frequently, for example, in the finite difference 
solutions of elliptic equations. 

Suppose iterative methods of the form (3.2) are being used in the solution of such a system. If 
the method used is the Jacobi iteration method, then it is known that the nonzero eigenvalues of A 
come in pairs of the form ± |i, see, e.g., Varga [20, Chapter 4], Consequently, if the eigenvalues of A 
are real, then they are in the interval [-1+6, 1-5] for some 6, 0 < 5 < 1, provided p(A) <1. As a 
result, the nonzero eigenvalues of A 2 are in the interval [ 6 , 1 - 1 )], for some e>0, where 
1-rt = (1— 5) 2 = 1-26 if 6 < 1. Furthermore, if 2M is the number of the distinct nonzero eigenvalues of 
A, then the number of the distinct nonzero eigenvalues of A 2 is M whether the eigenvalues of A are 
real or not. 

This implies that the approximation £ 2 *. ix obtained from the Jacobi iterative method and the 
approximation s 2 * obtained from the double Jacobi iterative method 


y = AXj + b 

Xj+ 1 — Ay + b , j 0, 1 , ..., 


(7.2) 


have the same asymptotic behavior as n -» ». In addition, since the largest eigenvalues of A 2 are 
twice as far from 1 as those of A, j 2 * is more stable than s\ n ^k as n -» » both mathematically and 
numerically. 

We can now couple the double Jacobi iteration method with the simple averaging procedure that 
was discussed above. Tha new iteration procedure then is 


y = AXj + b 

z =Ay+b (7.3) 

X J + 1 = ( 1 — co)jc y + coz , j = 0,1 

for some to * 0. As explained before, by varying co we can cause the spectrum of the iteration matrix 
of (7.3), namely, (1— <o)/ + cuA 2 , to become favorable to s Hik . In particular, by picking (0 = 2 we can 
cause this spectrum to lie in the interval [-l+2e, l-2t)] = [-l+2e, 1 — 45-+S 2 ]. This enlarges the distance 
of the largest positive eigenvalue of the Jacobi iteration matrix A from 1 even further, and introduces 
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negativc eigenvalues close to -1. This causes to become more stable. Furthermore, if e > 5, the 
convergence rate of x n from (7.3) with (0 = 2 is as good as that of x n from (7.2). 

We note, incidentally, that the iterative method of (7.3) with <d = 2 is known as Abramov’s 
method, see Faddeev and Faddeeva [5, p. 514], It is quite easy to see that, in this case, 

x j+ i-s = (2A 2 -I)(x r s) = T 2 (A)(x r s), 

where T 2 (k) = 2X 2 -1 is the Chebyshev polynomial of degree two. It should be emphasized that this is 
not Chebyshev acceleration, however. 

8. NUMERICAL EXAMPLES 

Wc have applied MPE and RRE through their new implementations described in the previous 
sections to several examples. This has been done by employing the computer program that is provided 
in Appendix B of this work. Some of the results obtained this way will be reported in this section. 

We have picked real linear systems of equations whose matrices are symmetric or nonsym- 
metric. Numerical results for two of these systems, one symmetric and the other nonsymmetric, are 
included in this work. 

Example I. Consider the vector sequence obtained from (3.2), where A is a 1000x1000 septadiagonal 
matrix symmetric with respect to both of its main diagonals, and is given by 


A = 0.06 x 


5 2 11 
2 6 3 1 1 
1 3 6 3 1 1 
1 1 3 6 3 1 1 
1 1 3 6 3 1 1 


The vector b is such that the exact solution s of (3.1) is (1,1,..., I) 7 ". 

All eigenvalues of A are in (0,1), the smallest and the largest being 4.7279 • • • x 10 -6 and 
0.95999 • • • , respectively. Consequently, the matrix C = I-A is symmetric positive definite. Also, 
there is a large amount of clustering of eigenvalues near the smallest and the largest ones. 
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Taking x 0 = 0, we generated the vectors x t ,* 2 .—. by (7.1) once by taking to = 1 and once by tak- 
ing to = 2, and then applied MPE to these two sequences. We also applied the method of conjugate 
gradients (CG) to the linear system Cx = b starting again with x 0 = 0. The results of these computa- 
tions are shown in Table la. 

Recall that the Amoldi method becomes equivalent to CG when C is a symmetric matrix, and 
MPE, when applied to a linearly generated sequence, becomes equivalent to the Amoldi method. Also 
Soki when applied to a sequence generated linearly as in (7.1), is independent of co. Consequently, 
s 0 *. both for <o= 1 and (0 = 2, obtained from MPE, and z*. obtained from CG, are all the same 
mathematically. This is verified in Table la at least for k £ 10. The differences between the © = 1 and 
co = 2 MPE computations for k > 10 can be explained exactly as described at the end of Section 7.2.1. 
Again, as can be seen from Table la, the co = 2 MPE computation differs from the CG computation 
starting with k = 40 approximately. Since CG involves orthogonalization with respect to only one vec- 
tor, its absolute accuracy is guaranteed. On the other hand, MPE involves orthogonalization with 
respect to an ever increasing number of vectors at each stage, thus it cannot be absolutely accurate. In 
spite of this, the present implementation of MPE seems to be very stable in the sense that II so.k~s 11 
seems to be constantly decreasing with increasing k. Indeed, we have verified this by going up to 
it = 100 in both the © = 1 and co = 2 MPE computations. 

Our purpose in presenting Table la was to demonstrate the good stability properties of the new 
MPE implementation for large values of k. Otherwise, CG is the method we would normally use for 
this example, since its operation count and storage requirements are extremely small. 

In Table lb we present the results obtained for the same example with © = 2 first performing 20 
iterations and then using MPE in the cycling mode with k = 10, as explained at the end of Section 5. 
The remarkable effectiveness of this strategy is obvious. 
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1 MPE | 

CG 



(0= 1 



CM 

II 

3 



k 

'•**iy*t 

Mr(*o.*)H 

II s Q k -s II 

r u ly* 1 

Hr (Jo.*) II 

11*0*-$ H 


0 

1.46D+00 

1.46D+00 

3.16D+01 

2.92D+00 

2.92D+00 

3.16D+01 

3.16D+01 

5 

1 92D-01 

1.92D-01 

1.17D+00 

3.83D01 

3.83D-01 

1.17D+00 

1.17D+00 

10 

1.98D-02 

1 .98D-02 

1.53D-01 

3.96D-02 

3.96D-02 

1.53D-01 

1.53D-01 

15 

2.51D-03 

2.51D-03 

2.03D-02 

5.01 D-03 

5.01 D-03 

2.02D -02 

2.02D-02 

20 

4.51D-04 

*.53D-04 

3.70D-03 

6.63D-04 

6.63D-04 

2.68D-03 

2.68D-03 

25 

1.5 8 D-04 

2.26D-04 

4.43D-03 

8.78D-05 

8.78D-05 

3.52D-04 

3.52D-04 

30 

6.27D-05 

1.11D-04 

2.44D-03 

1.15D-05 

1.15D-05 

4.63D-05 

4.63D-05 

35 

2.49D-05 

3 19D-05 

6.08D-04 

1.53D-06 

1.53D06 

6.53D-06 

6.1 1 D-06 

40 

6.37D-06 

1.42D-05 

3.34D-04 

5 16D-07 

5.30D-07 

1.64D-06 

8.03D 07 

45 

7.900 06 

5 4 2D 06 

3.66D 05 

7.3 ID 08 

1 29D 07 

1.27D-06 

L06D-07 

50 

; v" 

i 

i 

1 .29 D 06 

i 

3.17Dt% 

1 

4.291) -OS 

1 85D 07 

1 

!.'9P OX 


Table la - Numerical results for Example 1, starting withjto =0. 


l 

llr(4‘ ) )ll 

iijjt'Wn 

0 

4.75 D-01 

5.91 D+00 

i 

2.00 D-04 

6.94 D-04 

2 

2.90 D-06 

8.78 D-06 

3 

4.17 D-08 

1.74 D-07 

4 

9.27 D-10 

3.70 D-09 

5 

2.18 D-l 1 

9.11 D-l 1 

6 

5.49 D-13 

2.83 D-12 

7 

4.26 D-14 

1.77 D-13 

8 

6.16D-15 

9.46 D-14 


Table lb - MPE applied to Example 1 in the cycling mode. Starting with the zero vector, first 20 
iterations are performed. Following that MPE is applied in the cycling mode with k = 10. The / 2 - 
noim of the error in the initial (zero) vector is 3.16D+01. The vectors are obtained by "averaging" the 
iterative process (3.2) with co = 2. 
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Example 2. Consider the linear nonsymmetric system of equations Cx = b, where C is the block - tri- 
diagonal matrix 



and « =-1+5, b = -1-5, 6*0. (See [10, p. 122].) Again, the vector b is such that the exact solution 
i s (i ? i i) T . The iterative method that we pick for this system is Jacobi’s method, so that 


Now the matrix C is consistently ordered. Thus the suggestions put forth in Section 7.2.2 can be 


successfully employed in this case. 

In our numerical experiments we took 6 = 0.2. The matrices B and / in C were all 10x10 and C 
was 200x200, exactly as in [10]. The extrapolation method for which we give numerical results is 
RRE. We first applied RRE in the cycling mode in conjunction with Jacobi iteration. The vector 
obtained at the end of each cycle is denoted Next we applied RRE in the cycling mode in conjunc- 
tion with double Jacobi iteration. The vector obtained at the end of each cycle is denoted now. 
Finally, we applied RRE in the cycling mode extended as follows: The vector sequence is generated 
by the iterative procedure of (7.3) with to = 2, i.e., by the "averaged" double Jacobi iteration with 
to = 2. In each cycle n,+/t ( +l such iterations are performed, and extrapolation is applied to the last 
k,+2 of the vectors, i.e., in each cycle j„. *. is computed. The vector obtained at the end of each cycle 
now is denoted The index i denotes the cycle number in each case. 

In Table 2 we give the l 2 - norms of the errors ( k = 20), ( k = 10), and 

(«< = 5. K = 5 all /). Thus the number of basic Jacobi iterations performed to obtain the approxi- 
mations and in each cycle is 21, 22, and 22 respectively. We see that s 2 o and 5 io have 

comparable accuracy, as expected. The number of vector operations for s 2 q, however, is over three 
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times that forsjo. Also the storage requirement for F : ' is a Hr >ut twice tha‘ for s] ! The performance t>i 

r^ is only slightly inferior. The number of vector operations for J 5 5 is about one tenth that for T 20 , 
while its storage needs are about one third those of J 20 . 


i 

n — 0) n 

ll$20 H 

Hr 10 -r II 

llr^-rll 

1 

6 . 66 D -02 

7.47D-02 

1.34D-01 

2 

2.02D-04 

2.36D-04 

5.86D-04 

3 

2.53D-07 

4.26D-07 

1.14D-05 

4 

2.90D-10 

2.05D-09 

3.04D-08 

<; 

2.03D-12 

5.96D 12 

2.15D-10 

6 

1.35D-13 

6.48D-14 

1.07D-12 

7 

3.61 D- 14 

3.13D-14 

1.75D-14 


Table 2 - RRE applied to Example 2 in the cycling mode. The initial vector is zero, and the / 2 -norm 
of the error associated with it is 1.41D+01. 
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APPENDIX A 

Lemma: Let T be an mxm hermitian positive definite matrix, and let z\,Z 2 ,...,z m be complex vari- 
ables. Denote z = (Z[,Z 2 ,...,z m ) r . Then the solution to the problem 


minimize z T z 

m 

subject to £z, = 1 
1*1 

can be obtained by solving the linear system of m+1 equations 

T z = Xe 

m 

= i. 

i-t 

where z j z m , and X are unknowns, and 

i = (1,1 l) r . 

The unknown X turns out to be real and positive, and is given by 

X = z * T z at the solution. 
The solution of (A.2) can be achieved by first solving the system 


T h = e 


for/i = (h\,...,h m ) T , and letting 


X = 


r ) 

m 

i-t 


-i 


and finally setting 


z = Xh 


(A.l) 


(A.2) 


(A. 3) 


(A .4) 


(A. 5) 


(A. 6) 


(A.7) 


Proof: We start by expressing the problem in terms of real variables. Let us write T in the form 

7=A/ + iA\ M and N real mxm matrices. (A. 8) 


Then, by the assumption that T is hermitian, it follows that 

M t = M and N T = -N . 

Writing 


(A.9) 


z=x+iy, x and y real m -dimensional vectors , 


(A. 10) 
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and invoking (A.8) and (A.9) in z*T z, we have 

z*T z =x t M x + y T M y + 2y T N x . 

m 

Now the constraint = 1 in (A.l) is equivalent to two real constraints, namely, 

;=i 

= 1 and £yi=0. 

i=l i=l 


(All) 


(A. 12) 


We now use the method of Lagrange multipliers to minimize (A.l 1) subject to (A.12). Introducing 

m m 

the Lagrange multipliers -2 p and -2v for the constraints = 1 and £y, = 0, respectively, and tak- 

i=l i=l 

ing derivatives with respect to the x t and y t , we obtain the linear system of equations 


Mx - Ny - |i e = 0 
My + Nx — v e = 0, 


(A. 13) 


which, upon letting X = p + iv, becomes equivalent to Tz = X e. We have thus shown the truth of 

m 

(A.l). Multiplying Tz = X e on the left by z\ and using = 1, we obtain (A.4). Obviously, X has 

1=1 

to be strictly positive. For if X were zero, then 0 would have to be the solution as T is hermitian 

m 

positive definite, but this would contradict the constraint £z, = 1. The rest of the proof follows easily 

1=1 

from (A.2), and we shall omit it. 0 
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APPENDIX B 

In this appendix we give a computer code written in standard FORTRAN 77 that implement 
MPE and RRE as described in the present work. 

The implementation of MPE and RRE is done in SUBROUTINE MPERRE that forms the heart 
of this code. 

Use of MPE and RRE in the cycling mode is made possible by SUBROUTINE CYCLE. 

The vector sequence for extrapolation is generated by calling SUBROUTINE VECTOR, which, 
in the present code provides the iteration sequence of Example 1 with to = 2 weighting. 

The driving program in the present code is the one that generates some of results shown in Table 
lb. 

We give no further explanations about the code and its use, as the different parts of the code are 
documented in detail. 



non n n n a 
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£★**★*★*★**★★**★*************•************************************* 

C IMPLEMENTATION OF MPE AND RRE WITH QR FACTORIZATION FOR LEAST 
C SQUARES. (QR PERFORMED BY MODIFIED GRAM-SCHMIDT PROCESS) 

C MPE AND RRE ARE APPLIED IN THE CYCLING MODE. 

£******************» ****** *★★»** ★★*★*★*****★**★***★★**★★ ********** 

C THE COMPONENTS OF THE INITIAL VECTOR X, NAMELY, X ( I ) , 1*1 , . . . , ND 
C CAN BE PICKED RANDOMLY. WE ACHIEVE THIS, E.G., BY INVOKING THE 
C IMSL VERSION 10 SUBROUTINE DRNUN THAT GENERATES PSEUDORANDOM 
C NUMBERS FROM A UNIFORM (0,1) DISTRIBUTION. 

C OTHER CHOICES FOR X ( 1 ),..., X (NDIM) ARE POSSIBLE, SUCH AS X(I)-0, 

C I~1 , . . . , NDIM . IN THIS CASE REPLACE THE STATEMENT 
C CALL DRNUN (NDIM, X) 

C BY THE DO LOOP 
C DO 10 1=1, NDIM 

C X ( X) -0 

C 10 CONTINUE 

£*★★★*★★*★★★*★******************************************************* 


t ★ * ★ » * 


**»*■*■ * 
IM, 


IMPLICIT DOUBLE PRECISION (A-H,0-Z) 

PARAMETER (METHOD=l , N0=20 , N=0 , KMAX-1 0 , NCYCLE=1 5 , NDIM=1000) 
PARAMETER (EPSC-1D-10, IPRES-1, IPRES1-1) 

DIMENSION X (NDIM) , S (NDIM) , Y (NDIM) , Z (NDIM) 

DIMENSION Q (NDIM, 0 : KMAX-1) , R (0 : KMAX, 0 : KMAX) 

DIMENSION C ( 0 : KMAX ) , GAMMA ( 0 : KMAX ) , XI (0 : KMAX-1) 

EXTERNAL VECTOR 


INITIAL VECTOR DETERMINATION. 


CYC0C010 
CYC00020 
CYC00030 
CYC00040 
CYC00050 
CYC00060 
CYC00070 
CYC00080 
CYC00090 
CYC00100 
CYC001 10 
CYC0C120 
CYCC0130 
CYC00140 
CYC00150 
CYC0C160 
CYCC0170 
CYC0C1S0 
CYC00190 
CYC00200 
CYC0C2 10 
CYC00220 
CYC00230 
CYC002 40 
CYC00250 
CYC002 SO 
CYC00270 


CALL DRNUN (NDIM, X) 

DO 10 1=1, NDIM 
X(I)=0 

10 CONTINUE 

END OF INITIAL VECTOR DETERMINATION. 


CYC0C280 

CYC00290 

CYC00300 

CYCOC310 

CYC00320 

CYC0C330 

CYC00340 


c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


CALL CYCLE (METHOD, X, S, NO, N, KMAX, NCYCLE , NDIM, Y, Z, VECTOR, Q, R, 
*C, GAMMA, XI , RESC, EPSC, IPRES, IPRES1) 

STOP 

END 


SUBROUTINE CYCLE (METHOD, X, S, NO, N, KMAX, NCYCLE, NDIM, Y, Z, VECTOR, Q, R, 
*C, GAMMA, XI, RESC, EPSC, IPRES, IPRES1) 


*★*★★********★+**★★★***★★*■ 


*■»■**★★★■**■*★★★★»*■* + **■****★*****■*■******•***** 


CYC0C350 
CYC0C36C 
CYC00370 
CYC00390 
CYC003 90 
CYC00400 
CYC004 10 
*CYC004 20 


THIS SUBROUTINE APPLIES MPE AND RRE IN THE CYCLING MODE. 

MPE AND RRE ARE INVOKED BY CALLING SUBROUTINE MPERRE . 
*********************•***★»******★*-***★★★***★*****★★-***★***★»*★******** 

THE ARGUMENTS METHOD , NDIM, Y , Z , VECTOR, Q, R, C, GAMMA, XI , IPRES , IPRES1 
ARE AS IN SUBROUTINE MPERRE. 

X : INITIAL VECTOR. INPUT ARRAY OF DIMENSION NDIM. (DOUBLE 

PRECISION) 

S : THE FINAL APPROXIMATION PRODUCED BY THE SUBROUTINE. OUTPUT 

ARRAY OF DIMENSION NDIM. (DOUBLE PRECISION) 

NO : NUMBER OF ITERATIONS PERFORMED BEFORE CYCLING IS STARTED, 
I.E., BEFORE MPE OR RRE IS APPLIED FOR THE FIRST TIME. 

INPUT. (INTEGER) 

N : NUMBER OF ITERATIONS PERFORMED BEFORE MPE OR RRE IS APPLIED 

IN EACH CYCLE AFTER THE FIRST CYCLE. INPUT. (INTEGER) 

KMAX : WIDTH OF EXTRAPOLATION. ON EXIT FROM SUBROUTINE MPERRE IN 
EACH CYCLE, THE ARRAY S IS, IN FACT, THE APPROXIMATION 
S (NO, KMAX) IN THE FIRST CYCLE, AND S(N,KMAX) IN THE FOLLOWING 
CYCLES. INPUT. (INTEGER) 

NCYCLE: MAXIMUM NUMEER OF CYCLES ALLOWED. INPUT. (INTEGER) 

RESC : L2-NOPM OF THE RESIDUAL FOR S AT THE END OF EACH CYCLE. 

RETRIEVED AT THE END OF THE NEXT CYCLE. OUTPUT. (DOUBLE 
PRECISION) 

EPSC : AN UPPER BOUND ON RESC/RESP, SOME RELATIVE RESIDUAL FOR S, 


CYC004 30 
CYC0C440 
CYC004 5 0 
CYC0C460 
CYCCC470 
CYC004 80 
CYC004 90 
CYC00500 
CYC00510 
CYC00520 
CYC0C530 
CYC00540 
CYC005 50 
CYC00560 
CYC00570 
CYC00580 
CYC00590 
CYC00600 
CYC006 10 
CYC00620 
CYCC063C 
CYC00640 
CYCOCt 50 
CYCCC660 


oooooooo 
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USED IN THE STOPPING CRITERION. HERE RESP IS THE L2-NORM 
OF THE RESIDUAL FOR S(N0,KMAX) AT THE END OF THE FIRST 
CYCLE, I.E., ON EXIT FROM SUBROUTINE MPERRE THE FIRST TIME. 
IF RESC . LE . EPSC*RESP AT THE END OF SOME CYCLE, THEN ONE 
ADDITIONAL CYCLE IS PERFORMED, AND THE CORRESPONDING 
S (N, KMAX) IS ACCEPTED AS THE FINAL APPROXIMATION, AND THE 
SUBROUTINE IS EXITED. INPUT. (DOUBLE PRECISION) 
********************************★**«****.*******.******************** 

IMPLICIT DOUBLE PRECISION (A-H,0-Z) 

PADfiMPTFD 

DIMENSION X (NDIM) , S (NDIM) , Y (NDIM) , Z (NDIM) 

DIMENSION Q (NDIM, 0 :KMAX-1) , R (0 : KMAX, 0 : KMAX) 

DIMENSION C(0:KMAX) , GAMMA (0: KMAX) , XI (0:KMAX-1) 

EXTERNAL VECTOR 
DO 40 IC”1 , NCYCLE 

IF (IPRES.EQ. 1 .OR. IPRES1 .EQ. 1) THEN 
WRITE(6, 101) IC 

101 FORMAT*/,' CYCLE NO. ',13) 

END IF 

NN-N 

IF (IC.EQ.l) NN-N0 

IF ( IPRES . EQ . 1 . OR . IPRES1 . EQ . 1 ) THEN 
WRITE (6, 102 ) NN 

102 FORMAT*/,' NO. OF ITERATIONS PRIOR TO EXTRAPOLATION IS', 13) 
WRITE (6, 103) KMAX 

103 FORMAT*/,' WIDTH OF EXTRAPOLATION IS ',13) 

END IF 

DO 20 J-0.NN-1 
CALL VECTOR (X, Y, NDIM) 

DO 10 I-1,NDIM 
X * I ) -Y (I ) 

10 CONTINUE 
20 CONTINUE 

CALL MI ERRE (METHOD, X, S , KMAX, KOUT, NDIM, Y, Z , VECTOR, Q, R, C, 

* GAMMA, XI , RES , RES 1 , EPS, IPRES, IPRES1) 

IF (IC.EQ.l) RESP=R (0,0) 

RESC-R (0,0) 

IF (RESC . LE . EPSC*RESP) RETURN 

DO 30 I“l, NDIM 

X(I)-S(I) 

30 CONTINUE 
40 CONTINUE 
RETURN 
END 

SUBROUTINE MPERRE (METHOD , X, S , KMAX, KOUT, NDIM, Y, Z, VECTOR, Q, R, C, 
*GAMMA, XI, RES, RES1, EPS, IPRES, IPRES1) 


CYC00670 
CYC00680 
CYC00690 
CYC00700 
CYC007 10 
CYC00720 
CYC00730 
•CYC00740 
CYC00750 
CYC00760 
CYC00770 
CYC00780 
CYC00790 
CYC00800 
CYC008 10 
CYC00820 
CYC00830 
CYC00840 
CYC008 50 
CYC008 60 
CYC00870 
CYC008 80 
CYC00890 
CYC0090.0 
CYC00910 
CYC00920 
CYC00930 
CYC00940 
CYC00950 
CYC00960 
CYC00970 
CYC00980 
CYC00990 
CYC01000 
CYC01010 
CYC01020 
CYC01030 
CYC01040 
CYC01050 
CYC01060 
CYC01070 
CYC01080 
CYC01090 
CYC01100 
CYC01110 
CYC01120 
CYC01130 


£* + *★*****★★*******★★★★★***★★**★★***★★★**★********★*********■**★****■#*★**£^001140 


C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


THIS SUBROUTINE APPLIES THE MINIMAL POLYNOMIAL EXTRAPOLATION (MPE) 

OR THE REDUCED RANK EXTRAPOLATION (RRE) METHODS TO A VECTOR 
SEQUENCE X0,X1,X2, . . THAT IS OFTEN GENERATED BY A FIXED POINT 
ITERATIVE TECHNIQUE. 

BOTH MPE AND RRE ARE ACCELERATION OF CONVERGENCE (OR EXTRAPOLATION) 
METHODS FOR VECTOR SEQUENCES. EACH METHOD PRODUCES A TWO-DIMENSIONAL 
ARRAY S(N,K) OF APPROXIMATIONS TO THE LIMIT OR ANTILIMIT OF THE 
SEQUENCE IN QUESTION. 

THE IMPLEMENTATIONS EMPLOYED IN THE PRESENT SUBROUTINE GENERATE 
THE SEQUENCES S (0, 0 ) -X0, S (0 , 1 ) , S (0, 2) , . . . . 

r***********************************il 

AUTHOR 




; AVRAM SIDI 

COMPUTER SCIENCE DEPARTMENT 
TEC HN I ON- ISRAEL INSTITUTE OF TECHNOLOGY 
HAIFA 32000, ISRAEL 

E-MAIL ADDRESS: CSSS ID I0TECHNION . BITNET CYC 

r*********************-******* + ************ + ***^****************** + ***** 


CYC01150 
CYC01160 
CYC01170 
CYC01I60 
CYC01190 
CYC01200 
CYC012 10 
CYC01220 
CYC01230 
CYC01240 
CYC012 50 
CYC01260 
CYC01270 
CYC01280 
CYC01290 


l 300 


C METHOD: IF METHOD. EQ.l, THEN MPE IS EMPLOYED. IF METHOD. EQ. 2, THEN 


CYC01310 

CYC01320 
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c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


RRE IS EMPLOYED. INPUT., (INTEGER) 

X : THE VECTOR XO . INPUT ARRAY OF DIMENSION NDIM . (DOUBLE 

PRECISION) 

S • THE APPROXIMATION S(0,K) PRODUCED BY THE SUBROUTINE FOR 

EACH K. ON EXIT, S IS S(0,KOUT). OUTPUT ARRAY OF DIMENSION 
NDIM. (DOUBLE PRECISION) 

KMAX : A NONNEGATIVE INTEGER. THE MAXIMUM WIDTH OF EXTRAPOLATION 
ALLOWED. THUS THE NUMBER OF THE VECTORS XO , XI , X2 , . . . , 

EMPLOYED IN THE PROCESS IS KMAX+2 AT MOST. INPUT. (INTEGER) 
KOUT : A NONNEGATIVE INTEGER. KOUT IS DETERMINED BY A SUITABLE 
STOPPING CRITERION, AND DOES NOT EXCEED KMAX. THE VECTORS 
ACTUALLY EMPLOYED BY THE EXTRAPOLATION PROCESS ARE 
XO, XI, X2, . . . , XP, WHERE P-KOUT+1 . OUTPUT. (INTEGER) 

NDIM : DIMENSION OF THE VECTORS. INPUT. (INTEGER) 

Y : WORK ARRAY OF DIMENSION NDIM. (DOUBLE PRECISION) 

Z ! WORK ARRAY OF DIMENSION NDIM. (DOUBLE PRECISION) 

VECTOR- A USER-SUPPLIED SUBROUTINE WHOSE CALLING SEQUENCE IS 
CALL VECTOR (Y,Z, NDIM) ; Y, NDIM INPUT, Z OUTPUT. 

Y , Z , NDIM ARE EXACTLY AS DESCRIBED ABOVE. FOR A FIXED POINT 
ITERATIVE TECHNIQUE FOR SOLVING THE LINEAR OR NONLINEAR 
SYSTEM T=F(T), DIM(T)=NDIM, Y AND Z ARE RELATED BY Z— F(Y) . 
THUS Xl-F(XO), X2-F(X1), ETC. 

VECTOR SHOULD BE DECLARED IN AN EXTERNAL STATEMENT IN THE 
CALLING PROGRAM. 

Q : WORK ARRAY OF DIMENSION (NDIM, 0 : KMAX- 1 ) . FOR EACH K, ITS 

ELEMENTS ARE THOSE OF THE ORTHOGONAL MATRIX OBTAINED FROM 
QR FACTORIZATION OF THE MATRIX U 

U - ( BO I 01 I ... I UK ), K-0,1,2,..., 

WHERE U0=X1-X0, U1-X2-X1, U2-X3-X2, ETC. OUTPUT. (DOUBLE 
PRECISION) 

R : WORK ARRAY OF DIMENSION ( 0 : KMAX, 0 : KMAX) . FOR EACH K, ITS 

ELEMENTS ARE THOSE OF THE UPPER TRIANGULAR MATRIX OBTAINED 
FROM QR FACTORIZATION OF THE MATRIX U DESCRIBED ABOVE. 
OUTPUT. (DOUBLE PRECISION) 

C : WORK ARRAY OF DIMENSION (0:KMAX). FOR EACH K, C FOR MPE IS 

THE LEAST SQUARES SOLUTION OF THE SYSTEM U*C-0 SUBJECT TO 
THE CONSTRAINT C(K)=1. (DOUBLE PRECISION) 

GAMMA : WORK ARRAY OF DIMENSION (0:KMAX). FOR EACH K, THE GAMMA'S 
ARE SUCH THAT 

S (0, K) -GAMMA (0) *XC+GAMMA (1) *X1+ . . ,+GAMMA (K) *XK. 

FOR EACH K, GAMMA FOR RRE IS THE LEAST SQUARES SOLUTION OF 
THE SYSTEM U* GAMMA- 0 SUBJECT TO THE CONSTRAINT 
GAMMA (0 ) +GAMMA ( 1 ) + . . . +GAMMA (K) “1 . (DOUBLE PRECISION) 

XI : WORK ARRAY OF DIMENSION (0:KMAX-1). FOR EACH K, THE XI'S 
ARE SUCH THAT 

S (0, K) -X0+XI (0) *U0+XI (1) *U1+. . .+XI <J) *UJ, J-K-l . 

(DOUBLE PRECISION) 

RES : L2-NORM OF THE RESIDUAL FOR S(0,K) FOR A LINEAR SYSTEM 
T=A*T+B (OR AN ESTIMATE FOR IT FOR A NONLINEAR SYSTEM 
T-F(T)) FOR EACH K. ON EXIT, THIS K IS KOUT. OUTPUT. 

(DOUBLE PRECISION) 

RES 1 : L2-NORM OF THE RESIDUAL ACTUALLY COMPUTED FROM S(0,K) FOR 

EACH K. (THE RESIDUAL VECTOR FOR ANY VECTOR VEC IS TAKEN 
AS (F (VEC) -VEC) . ) ON EXIT, THIS K IS KOUT. OUTPUT. 

(DOUBLE PRECISION) 

EPS : AN UPPER BOUND ON RES/R<0,0), THE RELATIVE RESIDUAL FOR S, 
USED IN THE STOPPING CRITERION. NOTE THAT R ( 0 , 0 ) -L2-NORM 
OF THE RESIDUAL FOR XO, THE INITIAL VECTOR. IF, FOR SOME K, 
RES .LE.EPS*R(0, 0) , THEN THE CORRESPONDING S(0,K) IS ACCEPTED 
AS THE FINAL APPROXIMATION, AND THE SUBROUTINE IS EXITED 
WITH KOUT=K . IF S(0,KMAX) IS NEEDED, THEN EPS SHOULD BE 
SET EQUAL TO ZERO. INPUT. (DOUBLE PRECISION) 

IPRES : IF IPRES.EQ.i, THEN RES IS PRINTED FOR ALL K, K-0,1,... . 

OTHEWISE, IT IS NOT. INPUT. (INTEGER) 

IPRES1 : IF IPRES1.EQ.1, THEN RES1 IS COMPUTED AND PRINTED FOR ALL 
K, K=0 , 1 OTHERWISE, IT IS NOT . INPUT. (INTEGER) 


CYC01330 

CYC01340 

CYC01350 
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CYC01490 
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CYC01650 

CYC01660 
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CYC01710 

CYC01720 
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CYC01750 

CYC01760 

CYC01770 

CYC01780 

CYC01790 

CYC01800 

CYC01810 

CYC01820 

CYC0183C 

CYC01840 

CYC01850 

CYC01B6C 

CYC0187C 
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CYC01890 
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£****+****************************************************************♦* CYC01990 
C THE ABOVE MENTIONED QR FACTORIZATION IS PERFORMED BY EMPLOYING CYC02000 
C THE MODIFIED GRAM-SCHMIDT PROCESS. CYC02010 
Cur********************************************************************** CYC02020 


IMPLICIT DOUBLE PRECISION (A-H,0-Z) 

PARAMETER (EPS1-1D-32, EPS2-1D-16) 

DIMENSION X (NDIM) , S (NDIM) , Y (NDIM) , Z (NDIM) 

DIMENSION Q (NDIM, 0 :KMAX-1) ,R(0:KMAX, 0:KMAX) 

DIMENSION C (0 :KMAX) , GAMMA (0 : KMAX) , XI (0:KMAX-1) 

IF ( IPRES . EQ . 1 .AND . I PRES 1 . EQ . 1 ) THEN 
WRITE (6, 301) 

301 FORMAT (/,' K RES RES1' ) 

ELSE IF (IPRES. EQ.l. AND. IPRES1.NE.1) THEN 
WRITE (6, 302) 

302 FORMAT (/,' K RES') 

ELSE IF ( IPRES. NE.l. AND. IPRES1. EQ.l) THEN 
WRITE (6,303) 

303 FORMAT (/,' K RES1' ) 

END IF 

DO 10 1-1 , NDIM 
Y(I)-X(I) 

10 CONTINUE 

DO 250 K-0,KMAX 

COMPUTATION OF THE VECTOR XJ, J-K+l, FROM XK, AND COMPUTATION OF UK 

CALL VECTOR { Y, Z, NDIM) 

DO 20 I-l, NDIM 
Y(I)-Z(I)-Y(I) 

20 CONTINUE 

DETERMINATION OF THE ORTHONORMAL VECTOR QK FROM UK BY THE MODIFIED 
GRAM-SCHMIDT PROCESS 

DO 50 J-0, K-l 
SUM-0 

DO 30 1-1, NDIM 
SUM-SUM+Q ( I , J ) * Y (I) 

30 CONTINUE 

R ( J, K) -SUM 

DO 40 I— 1 , ND IM 

Y ( I ) -Y ( I ) -R ( J, K) *Q(I, J) 

40 CONTINUE 

50 CONTINUE 

SUM-0 

DO 60 I-l , NDIM 
SUM-SUM+Y (I) **2 
60 CONTINUE 

R(K,K)-DSQRT(SUM) 

IF (R(K,K) ,GT.EPS1*R(0, 0) .AND. K.LT. KMAX) THEN 
HP-1D0 /R (K, K) 

DO 70 1-1 , NDIM 
Q(I, K) — HP*Y (I) 

70 CONTINUE 

ELSE IF (R(K,K) . LE . EPS1*R ( 0 , 0 ) ) THEN 
EEE-EPS1 

WRITE (6, 304 ) K, K, EEE 

304 FORMAT ( / , ' R ( ' , 13, ' , ' , 13, ' ) . LE . ' , IP, D8 . 1 , ' *R (0, 0) 

END IF 

END OF COMPUTATION OF THE VECTOR QK 
IF (METHOD .EQ. 1) THEN 
COMPUTATION OF THE GAMMA'S FOR MPE 
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do 90 I-K-1,0,-1 
Cl— R(I,K) 

DO 80 J-I+1,K-1 
CI-CI-R(I, J) *C(J) 

80 CONTINUE 

C ( I ) -CI/R(I, I) 

90 CONTINUE 
C (K) -1D0 
SUM-0 

DO 100 I-0,K 
SUM-SUM+C ( I ) 

100 CONTINUE 

IF (DABS (SUM) .LE.EPS2) THEN 
WRITE (6, 311) K 

311 FORMAT (/,' S( 0,',I3,') IS NOT DEFINED.',/) 
GO TO 250 
END IF 

DO 110 I-0,K 
GAMMA (I) -C (I) /SUM 
110 CONTINUE 

RES-R (K, K) *DABS (GAMMA (K) ) 

C 

C END OF COMPUTATION OF THE GAMMA'S FOR MPE 
C 

ELSE IF (METHOD. EQ . 2) THEN 
C 

C COMPUTATION OF THE GAMMA' S FOR RRE 
C 

DO 130 I-0,K 
CI-1D0 

DO 120 J-0,1-1 
CI-CI-R ( J, I) *C ( J) 

120 CONTINUE 

C(I)“CI/R(I, I) 

130 CONTINUE 

DO 150 I-K, 0,-1 
CI-C(I) 

DO 140 J-I+1,K 
CI-CI-R ( I, J) *GAMMA ( J) 

140 CONTINUE 

GAMMA ( I ) =C I/R ( I , I) 

150 CONTINUE 
SUM-0 

DO 160 1=0, K 
SUM-SUM+GAMMA ( I ) 

160 CONTINUE 

DO 170 1=0 , K 
GAMMA ( I ) -GAMMA ( I )/ SUM 
170 CONTINUE 

RES-1D0/DSQRT (DABS (SUM) ) 

C 

C END OF COMPUTATION OF THE GAMMA'S FOR RRE 
C 

END IF 
KOUT-K 

IF ( IPRES . EQ , 1 . AND . IPRES1 . NE . 1 ) THEN 
WRITE ( 6 , 321) K, RES 
321 FORMAT (13, 2X, IP, D15 . 2) 

END IF 

IF ( RE S . LE . E P S * R ( 0 , 0 ) ,OR.R(K,K) . LE . EPS1*R ( 0, 0) 
* .OR. K. EQ . KMAX.OR . IPRES 1 . EQ . 1) THEN 
C 

C COMPUTATION OF THE APPROXIMATION S(0,K) 

C 

XI ( 0 ) -1D0 -GAMMA ( 0 ) 

DO 180 J-l , K-l 
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XI (J)-XI (J-l) -GAMMA (J) 

CYC03310 

180 

CONTINUE 

CYC03320 


DO 190 I“1 , NDIM 

CYC03330 


S(I)-X<I> 

CYC03340 

190 

CONTINUE 

CYC03350 


DO 220 J-0,K-1 

CYC03360 


HP-0 

CYC03370 


DO 200 I-J, K-l 

CYC03380 


HP-HP+R( J, I) *XI (I) 

CYC03390 

200 

CONTINUE 

CYC03400 


DO 210 1-1 , NDIM 

CYC034 10 


S ( I) -S (I ) +HP*Q (I , J) 

CYC03420 

210 

CONTINUE 

CYC034 30 

220 

CONTINUE 

CYC03440 

C 


CYC03450 

C END OF COMPUTATION OF THE APPROXIMATION S(0,K) 

CYC034 60 

C 


CYC03470 


END IF 

CYC034 80 


IF (IPRES1.EQ. 1) THEN 

CYC034 90 

C 


CYC03500 

C EXACT COMPUTATION OF RESIDUAL L2-NORM. 

CYC03510 

C 


CYC03520 


CALL VECTOR < S, Y, NDIM) 

CYC03530 


RES1-0 

CYC03540 


DO 230 I“1 , NDIM 

CYC03550 


RES1-RES1+(Y(I)-S(I) ) * *2 

CYC03560 

230 

CONTINUE 

CYC03570 


RES1-DSQRT (RES1) 

CYC03580 

C 


CYC03590 

C END OF EXACT COMPUTATION OF RESIDUAL L2-NORM . 

CYC03600 

C 


CYC03610 


IF (IPRES.EQ.l) THEN 

CYC03620 


WRITE (6, 331) K, RES , RES1 

CYC03630 

331 

FORMAT (13, 2X, IP, 2D15 .2) 

CYC03640 


ELSE IF (IPRES.NE.l) THEN 

CYC03650 


WRITE (6, 332 ) K, RES1 

CYC03660 

332 

FORMAT (I3,2X,1P,D15.2) 

CYC03670 


END IF 

CYC03680 


END IF 

CYC03690 


IF (RES.LE.EPS*R(0,0) . OR ,R(K,K) . LE . EPS 1 *R ( 0 , 0) ) RETURN 

CYC03700 


DO 240 I-l , NDIM 

CYC037 10 


Y(I)-Z(I) 

CYC03720 

240 

CONTINUE 

CYC03730 

250 

CONTINUE 

CYC03740 


RETURN 

CYC03750 


END 

CYC03760 


CYC03770 


SUBROUTINE VECTOR (X f Y, NDIM) CYC03780 


C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 




THIS SUBROUTINE GENERATES THE VECTOR Z FROM THE VECTOR Y BY USING, CYC03800 
E.G., A FIXED POINT ITERATION TECHNIQUE. CYC03810 

********************************************************************* * *CYC0 3820 


IN THE PRESENT EXAMPLE THE ITERATIVE TECHNIQUE IS OF THE FORM 
Y-A1*X+B1 . HERE A1 IS AN NDIM*NDIM SEPTAD I AGONAL MATRIX SYMMETRIC 
WITH RESPECT TO BOTH OF ITS DIAGONALS, AND IS DEFINED AS 
Al- (1-OMEGA) * I+OMEGA* A, WHERE OMEGA IS A SCALAR, I IS THE 
IDENTITY MATRIX, AND A IS THE MATRIX 


A 


0.06* 


5 

2 

1 

1 


2 1 
6 3 
3 6 
1 3 
1 1 
1 


1 

1 1 
3 1 
6 3 
3 6 
1 3 


1 

1 1 
3 1 
6 3 


1 

1 1 
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CYC03930 
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C B1 IS THE VECTOR DEFINED AS Bl-OMEGA*B, THE VECTOR B BEING CHOSEN 
C SUCH THAT THE SOLUTION OF THE SYSTEM T“A*T+B IS THE VECTOR 

C THE ITERATIVE TECHNIQUE USED IS THUS RICHARDSON'S ITERATIVE 
C METHOD APPLIED TC "THE SYSTEM (I-A)*T-B. 






10 


20 


IMPLICIT DOUBLE PRECISION (A-H,0-Z) 

PARAMETER (OMEGA-2DO , TAU-1DO-OMEGA) 

DIMENSION X (NDIM) , Y (NDIM) 

N-NDIM 

Y(1)-(5*X(1) +2*X (2) +X (3) +X (4) ) ‘6D-2+46D-2 
Y(2)-(2*X(1)+6*X(2)+3*X(3)+X(4)+X(5) ) ‘6D-2+22D-2 
Y(3)-(X(1)+3*X(2)+6*X(3)+3*X(4)+X(5)+X(6)) *6D-2+lD-l 
DO 10 1=4, N-3 

Y(I)-(X(I-3)+X(I-2)+3*X(I-1)+6*X(I)+3*X(I+1)+X(I+2)+X(I+3) ) *6D- 

* +4D-2 

CONTINUE , ^ , 

Y(N-2)-(X(N)+3*X(N-l)+6*X(N-2)+3*X(N-3)+X(N-4)+X(N-5) ) ‘6D-2+1D- 
Y (N-l) -(2*X(N) + 6*X (N-l) +3*X(N-2) +X (N-3) +X (N-4 ) ) ‘6D-2+22D-2 
Y (N) - ( 5*X (N) +2*X (N-l ) +X (N-2) +X (N-3) ) *6D-2+46D-2 
DO 20 1=1, N 

Y ( I) =TAU*X (I ) +OMEGA* Y (I) 

CONTINUE 

RETURN 

END 
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cycle NO . 1 

NO. OF ITERATIONS PRIOR TO EXTRAPOLATION IS 20 
WIDTH OF EXTRAPOLATION IS 10 


K 

RES 

RES1 

0 

4.75D-01 

4 . 75D-01 

1 

5.36D-01 

5.36D-01 

2 

1.52D-02 

1.52D-02 

3 

1.93D-02 

1.93D-02 

4 

4 . 23D-03 

4.23D-03 

5 

3.79D-03 

3 . 79D-03 

6 

1. 41D-03 

1.41D-03 

7 

1 . 00D-03 

1 . 00D-03 

8 

5. 16D-04 

5.16D-04 

9 

3. 04D-04 

3 . 04D-04 

10 

2.00D-04 

2.00D-04 

CYCLE 
NO . OF 

NO. 2 

ITERATIONS PRIOR 

TO EXTRAPOLATION 

WIDTH 

OF EXTRAPOLATION 

IS 10 

K 

RES 

RES1 

0 

2 . 00D-04 

2 . 00D-04 

1 

9 . 57D-05 

9.57D-05 

2 

9. 59D-05 

9 . 59D-05 

3 

4 . 58D-05 

4 . 58D-0 5 

4 

4 . 42D-05 

4 . 4 2D-05 

5 

1 .68D-05 

1 . 68D-0 5 

6 

1 . 91D-05 

1 . 91D-05 

7 

6.49D-06 

6.49D-06 

8 

7.22D-06 

7 .22D-06 

9 

2.56D-06 

2.56D-06 

10 

2 . 90D-06 

2.90D-06 

CYCLE 
NO. OF 

NO. 3 

' ITERATIONS PRIOR 

TO EXTRAPOLATION 

WIDTH 

OF EXTRAPOLATION 

IS 10 

K 

RES 

RES 1 

0 

2.90D-06 

2 . 90D-06 

1 

1 . 18D-06 

1 . 18D-06 

2 

1 . 38D-06 

1.38D-06 

3 

6.20D-07 

6.20D-07 

4 

6.64D-07 

6.64D-07 

5 

2.43D-07 

2.43D-07 

6 

2. 63D-07 

2.63D-07 

7 

8 . 58D-08 

8.58D-08 

8 

9 . 15D-08 

9.15D-08 

9 

3.95D-08 

3.95D-08 

10 

4 . 17D-08 

4 . 17D-08 


CYCLE NO. 4 

NO. OF ITERATIONS PRIOR TO EXTRAPOLATION IS 0 
WIDTH OF EXTRAPOLATION IS 10 


K 

0 


RES 

4 . 17D-08 
2 . 44D-08 


RES1 

4 . 17D-08 
2 . 44D-08 
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2 

2 . 40D-08 

2.40D-08 

3 

1 .29D-08 

1.29D-08 

4 

1.24D-08 

1.24D-08 

5 

5.49D-09 

S.49D-09 

6 

5. 39D-09 

5.39D-09 

7 

1.95D-09 

1.95D-09 

8 

1. 96D-09 

1.96D-09 

9 

8.71D-10 

8.71D-10 

10 

9 . 27D-10 

9 .270-10 

CYCLE 
NO. OF 

NO. 5 

ITERATIONS PRIOR 

TO EXTRAPOLATION 

WIDTH 

OF EXTRAPOLATION 

IS 10 

K 

RES 

RES1 

0 

9 . 27D-10 

9.27D-10 

1 

5. 14D-10 

S.14D-10 

2 

5. 46D-10 

5.46D-10 

3 

2.74D-10 

2.74D-10 

4 

2.71D-10 

2.71D-10 

5 

1.17D-10 

1.17D-10 

6 

1.09D-10 

1.09D-10 

7 

4 . 64D-11 

4 . 64D-11 

8 

4 . 28D-11 

4 . 28D-11 

9 

2 . 07D-11 

2.07D-11 

10 

2 . 19D-11 

2.18D-11 

CYCLE 
NO. OF 

NO. 6 

ITERATIONS PRIOR 

TO EXTRAPOLATION 

WIDTH 

OF EXTRAPOLATION 

IS 10 

K 

RES 

RES1 

0 

2.18D-11 

2.18D-11 

1 

1.25D-11 

1 . 25D-11 

2 

1.26D-11 

1.26D-11 

3 

7 . 23D-12 

7.23D-12 

4 

6.32D-12 

6.32D-12 

5 

3.29D-12 

3.28D-12 

6 

2.85D-12 

2.85D-12 

7 

1.27D-12 

1 .28D-12 

8 

1.15D-12 

1 .15D-12 

9 

S.79D-13 

5 .670-13 

10 

5. 67D-13 

5.49D-13 
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